different strategy
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (3 more...)
Beyond accuracy: quantifying trial-by-trial behaviour of CNNs and humans by measuring error consistency
A central problem in cognitive science and behavioural neuroscience as well as in machine learning and artificial intelligence research is to ascertain whether two or more decision makers---be they brains or algorithms---use the same strategy. Accuracy alone cannot distinguish between strategies: two systems may achieve similar accuracy with very different strategies. The need to differentiate beyond accuracy is particularly pressing if two systems are at or near ceiling performance, like Convolutional Neural Networks (CNNs) and humans on visual object recognition. Here we introduce trial-by-trial error consistency, a quantitative analysis for measuring whether two decision making systems systematically make errors on the same inputs. Making consistent errors on a trial-by-trial basis is a necessary condition if we want to ascertain similar processing strategies between decision makers. Our analysis is applicable to compare algorithms with algorithms, humans with humans, and algorithms with humans. When applying error consistency to visual object recognition we obtain three main findings: (1.) Irrespective of architecture, CNNs are remarkably consistent with one another.
How snake bites really work
Vipers can strike within 100 milliseconds of launching at their prey. Breakthroughs, discoveries, and DIY tips sent every weekday. A venomous snake bite is not something you ever want to encounter on a hiking or camping trip. For those brave scientists who study snakes-aka herpetologists -the mechanics behind the reptiles' fast fangs are more fascinating than fear-inducing. Snakes must move incredibly quickly to sink their fangs into prey before the victim flinches.
Input-Time Scaling
Current Large Language Models (LLMs) are usually post-trained on large-scale carefully curated datasets (data & training scaling) and doing reasoning in test time (inference time scaling). In this work, we present a new scaling paradigm, Input-Time Scaling, to complement previous scaling methods by putting resources on queries (input time). During training and testing, we utilize meta-knowledge from LLMs to refine inputs with different strategies. We also discover a new phenomenon, train-test co-design. It requires us to apply query strategies during training and testing as a whole. Only applying strategies on training or testing would seriously degrade the performance gained. We are also surprised to find that seemingly low data quality datasets can perform better. We can get the best performance even by adding irrelevant information to the queries, with randomly selected 1k examples from a minimally filtered dataset. These findings contradict the widely held inductive bias, "garbage in, garbage out". Curating datasets with seemingly high-quality data can even potentially limit the performance ceiling. In addition, models trained on more data with similar quality (15k VS 1k) perform worse, the intuition of simply scaling the size should also be carefully inspected. The good news is that our findings are compatible with the Less is More phenomenon. 1K examples are enough to invoke high-level reasoning ability. With experiments on Qwen2.5-32B-Instruct, we are able to reach SOTA performance among 32B models on AIME24(76.7%) and AIME25(76.7%) pass@1. We can further achieve AIME24(76.7%) and AIME25(80%) with a majority vote of three models. Starting from DeepSeek-R1-Distill-Qwen-32B, the result would be 90.0% on AIME24 and 80.0% on AIME25. To facilitate reproducibility and further research, we are working on open-source our datasets, data pipelines, evaluation results, and checkpoints.
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.66)
RANet: Region Attention Network for Semantic Segmentation - Supplementary Material - Dingguo Shen
The first two authors share the contribution equally. Di Lin is the corresponding author of this paper. However, using the intermediate pixels requires extra computation. In Figure 3, we provide the segmentation results with/without using the intermediate pixel. In Table 2, we compare different strategies of using the representative scores in the region interaction. We also study the strategy of using only the representative scores in the region interaction.
- Asia > China > Hong Kong (0.06)
- North America > Canada (0.05)
- Asia > China > Tianjin Province > Tianjin (0.05)
- Asia > China > Guangdong Province > Shenzhen (0.05)
A Poly24 Dataset Poly24 is a dataset provided by Density Function Theory and is designed to calculate the enthalpy change in ring-opening polymerization (H
This process involves breaking a cyclic monomer ring and attaching the "opened" monomer to an extensive chain, ultimately forming a polymer chain. The dataset was generated using Molecular Dynamics (MD) simulations of various monomer and polymer models at a consistent level of DFT [6, 9] computations. Leveraging the Polymer Structure Predictor (PSP) package [13], various polymer models were generated from a given cyclic monomer. Each model was obtained by multiplying the monomer with a small integer L, for instance, L = 3, 4, 5, and 6, thereby creating a loop of size L (with larger loops more accurately modeling polymers). For every monomer or polymer model, approximately ten or more maximally diversified configurations were selected as the starting points for the MD simulations based on Density Functional Theory (DFT).
A Appendix
This is simple to see as the ranks in the uneven depthwise are computed per input and the merging is done by output. The proposed RED method is summarized in algorithm 1. Note that we didn't describe the Strategy % removed parameters linear descending 77.90 constant 78.69 linear ascending 80.35 block 84.52 The constant strategy provides the best results. Following the study from Section 5.2, we want to empirically validate that hashing a DNN RED appears to be robust to dropout.
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (3 more...)
Bayesian Flow Is All You Need to Sample Out-of-Distribution Chemical Spaces
Generating novel molecules with higher properties than the training space, namely the out-of-distribution generation, is important for ${de~novo}$ drug design. However, it is not easy for distribution learning-based models, for example diffusion models, to solve this challenge as these methods are designed to fit the distribution of training data as close as possible. In this paper, we show that Bayesian flow network is capable of effortlessly generating high quality out-of-distribution samples that meet several scenarios. We introduce a semi-autoregressive training/sampling method that helps to enhance the model performance and surpass the state-of-the-art models.
- Asia > Japan > Honshū > Chūgoku > Hiroshima Prefecture > Hiroshima (0.05)
- North America > United States (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)